Baseline gene expression profiling determines long-term benefit to programmed cell death protein 1 axis blockade

Treatment with immune checkpoint inhibitors has altered the course of malignant melanoma, with approximately half of the patients with advanced disease surviving for more than 5 years after diagnosis. Currently, there are no biomarker methods for predicting outcome from immunotherapy. Here, we obtained transcriptomic information from a total of 105 baseline tumor samples comprising two cohorts of patients with advanced melanoma treated with programmed cell death protein 1 (PD-1)-based immunotherapies. Gene expression profiles were correlated with progression-free survival (PFS) within consecutive clinical benefit intervals (i.e., 6, 12, 18, and 24 months). Elastic net binomial regression models with cross validation were utilized to compare the predictive value of distinct genes across time. Lasso regression was used to generate a signature predicting long-term benefit (LTB), defined as patients who remain alive and free of disease progression at 24 months post treatment initiation. We show that baseline gene expression profiles were consistently able to predict long-term immunotherapy outcomes with high accuracy. The predictive value of different genes fluctuated across consecutive clinical benefit intervals, with a distinct set of genes defining benefit at 24 months compared to earlier outcomes. A 12-gene signature was able to predict LTB following anti-PD-1 therapy with an area under the curve (AUC) equal to 0.92 and 0.74 in the training and validation set, respectively. Evaluation of LTB, via a unique signature may complement objective response classification and characterize the logistics of sustained antitumor immune responses.


INTRODUCTION
With incidence rates rising over the past 40 years, melanoma is projected to cause about 100,000 new cases and 8000 deaths in the United States in 2022 [1][2][3] . Development of monoclonal antibodies targeting the cytotoxic T-lymphocyte antigen-4 (CTLA-4; ipilimumab, approved by the FDA in 2011) and programmed cell death protein 1 (PD-1; nivolumab, pembrolizumab, approved by the FDA in 2014) has dramatically improved outcomes for patients with advanced disease; five-year survival rates have climbed to 52% for previously untreated patients receiving the combination of ipilimumab and nivolumab and 41-44% for those receiving anti-PD-1 monotherapy 4,5 . More importantly, unlike classic chemotherapy or targeted therapies, pivotal clinical trials for immune checkpoint inhibitors (ICIs) have documented a plateau of durable responses apparent at the tail of the survival curves. In the real world, a large fraction of long-term survivors may be off-treatment and have no active disease, having required only immune checkpoint inhibition and at some point, local therapy for residual or oligometastatic disease 6 .
Currently, there are no FDA-approved biomarkers predictive of response to ICIs in patients with melanoma. Baseline assessment of programmed death-ligand 1 (PD-L1) expression by immunohistochemistry, tumor infiltration by CD8-positive T cells, evidence of an inflamed tumor microenvironment (TME) driven by active interferon gamma (IFNγ) signaling detected by gene expression profiling, or high tumor mutational burden (TMB) have shown limited predictive value when tested prospectively or have been challenging to implement in the clinic [7][8][9][10][11][12][13][14][15] . Moreover, the evaluation of objective response rate by standard radiographic Response Evaluation Criteria in Solid Tumors (RECIST) 1.1 systematically fails to capture certain patterns of response to immunotherapy and may underestimate therapeutic benefit from ICIs 16,17 . For patients who improve with immune checkpoint inhibition, categorical definition of benefit is critical. Determination of the optimal duration of treatment is also needed, with implications to treatment-related toxicity, quality of life and health economics; several studies have suggested that a limited rather than continued course of treatment may be sufficient to produce profound responses 18 . Conversely, for those who develop primary or acquired resistance to ICIs, implementation of novel approaches, including new immunotherapy combinations, through clinical trials may prove valuable.
Here, we sought to determine whether long-term benefit (LTB) to immunotherapy represents a biologically distinct entity compared with objective response or short-term outcomes. To do so, we acquired baseline transcriptomic information from two cohorts of patients with melanoma treated with PD-1-based immunotherapies. Instead of objective response classification per RECIST 1.1, we utilized progression-free survival (PFS) exclusively to separate patients who benefit from immune therapy, from those who do not. We provide proof of concept that different gene expression profiles are associated with outcome at subsequent timepoints post treatment initiation. We further develop and validate a signature that can accurately predict the benefit from immunotherapy at 24 months, characterizing a subgroup of long-term survivors that reside in the tail of the survival curves. Such patients are essentially candidates for functional cure and should probably be managed in a different manner from those who show a response by RECIST 1.1.

Cohort characteristics
A total of 105 patients were included in the analysis (Fig. 1). Median age of study participants was 63 years, ranging from 16 to 88 years. Sixty patients were males and 45 were females. Activating BRAF and NRAS mutations were present in 32 (31%) and 18 (17%), respectively. All patients received PD-1-based immunotherapies in the advanced setting. Fifty-eight patients (55%) received anti-PD-1 monotherapy, including 32 treated with pembrolizumab and 26 treated with nivolumab, and 47 patients (45%) received combination immunotherapy with anti-CTLA-4 plus anti-PD-1 (ipilimumab plus nivolumab). It should be noted that 22 patients (21%) had received ipilimumab monotherapy in a previous line of treatment. ORR was 43%. Disease progression at data cutoff was documented in 67 patients (64%). Median PFS for the study cohort was 7.4 months. STB was observed in 42 patients (40%); LTB, calculated at 24 months, was observed in 22 patients (21%). Forty-three patients (41%) survived until data cutoff and median overall survival (OS) for the study cohort was 20.1 months. Detailed cohort characteristics can be seen in Table 1.
Baseline gene expression profiles are consistently able to predict LTB to PD-1 axis blockade To predict clinical benefit from ICIs at different timepoints (i.e., 6, 12, 18, and 24 months post treatment initiation) based on gene expression, we first tested a series of elastic net binomial models with α ∊ [0,1] with increments of 0.1 (α equals '0 'for ridge and '1' for lasso regression; Fig. 2). In general, α = 0.8 appeared to provide consistent AUC values at all four timepoints as well as highest performance for the model predicting STB. Alpha values close to '1' approach a lasso regression model, which reduces the number of genes to avoid overfitting. Overall, our models showed the highest accuracy for predicting clinical benefit at 24 months (LTB), with AUC values consistently over 0.8. These results suggested that different genes account for the prediction of PFS across time.   Fig. 4B) Baseline gene expression profiling is accurate at predicting long-term immunotherapy outcomes. Comparison of elastic net binomial models designed for the prediction of PFS at consecutive 6-month intervals across α values ranging from 0 to 1; α equals "0" for ridge and "1" for lasso regression. Models designed for the prediction of PFS at 24 months perform consistently better in comparison with models designed for the prediction of PFS at 6, 12, and 18 months. The error bars correspond to the 95% confidence intervals. PFS progression-free survival. in the discovery cohort 10,19,20 . Moreover, our optimized model for the prediction of STB had an AUC of 0.75 (95% CI, 0.63-0.87); of the genes included in the LTB model, only CXCL8 and HLA.DQA2 accounted for the prediction of STB. Our LTB model had an accuracy of 0.86, sensitivity of 1, specificity of 0.46, PPV of 0.84, and NPV of 1. The p-value for accuracy versus non-informative rate (NIR) was 0.03. In addition, based on the Youden's index, we identified the optimal cutoff for our LTB scoring (LTB score = 0.32; Fig. 4C, D). Our LTB model performed poorly in the TCGA melanoma dataset (patients not treated with immunotherapy), suggesting lack of prognostic value and immunotherapy specificity ( Supplementary Fig. 1). The 12-gene signature was validated with a second independent immunotherapy-treated melanoma cohort (validation cohort), where it predicted LTB with an AUC of 0.74 (95% CI, 0.52-0.95; Fig. 5A). On the same cohort, the TIS and STB model exhibited an AUC of 0.57 (95% CI, 0.39-0.75) and 0.60 (95% CI, 0.42-0.80), respectively (Fig. 5B). Using the optimal cutoff, patients with high LTB score had significantly prolonged PFS compared with patients with low LTB score in the validation cohort (log rank test; p = 0.012; Fig. 5C, D). Furthermore, our set of 11 genes (excluding HLA.DQA2) was highly predictive of LTB in a third external validation cohort (Gide et al.) with an AUC of 0.87 (95% CI, 0.82-0.92; Supplementary Fig. 2).

DISCUSSION
In this proof-of-concept study, we used a simple, sensitive, and quantitative approach, compatible with tissue-limiting FFPE tumor specimens routinely obtained in the clinic, to capture transcriptomic information portraying the complex interplay between the tumor and the TME. We observed that baseline immune gene expression profiling was more accurate at predicting long-term immunotherapy outcomes (at 24 months post treatment initiation) compared with short-term (<24 months post treatment initiation). Next, we documented a switch in the predictive value of single genes when evaluated at consecutive timepoints; while certain genes are associated with response early in the course of treatment, others appear to dominate for patients with LTB at 24 months. Finally, through a rigorous multistep process, we developed a 12-gene signature that predicted LTB to PD-1-based immunotherapies in patients with melanoma.
The role of TIS in predicting objective response to immunotherapy has been established previously. In their study, Cristescu et al. used two clinical trial cohorts (KEYNOTE-001, KEYNOTE-006) comprising a total of 89 patients with advanced melanoma that were treated with pembrolizumab 20 . They found that TIS was able to predict objective response with an AUC of 0.64. In our study, we used 2 real-world cohorts comprising a total of 105 patients with advanced melanoma that were treated with either of the three FDA-approved regimens for advanced disease (nivolumab, pembrolizumab, nivolumab plus ipilimumab). We showed that TIS was able to predict objective response with an AUC of 0.67 in the discovery cohort (n = 59) and an AUC of 0.57 in the validation cohort (n = 46). Despite any differences in treatment regimens, the combined AUC of both our cohorts would be extremely close to that of Cristescu et al. and this is in support of our findings.
To evaluate benefit following treatment with ICIs we show that conventional objective response classification based on radiographic RECIST 1.1 may be less valuable in thinking about patient gain. Bidimensional tumor shrinkage typically characterizes tumor cell death as a result of cytotoxic chemotherapy or targeted therapies 21 . However, classic RECIST 1.1 do not capture atypical patterns of response, such as mixed response or pseudoprogression, occasionally seen in patients receiving ICIs 17 . In addition, radiographic assessment of response does not always correlate with time to progression 6 . Although complete eradication of tumor lesions may predict prolonged PFS, documentation of a partial response or even stable disease does not preclude sustained antitumor responses 22 . We alternately assessed treatment benefit as a binary variable (i.e., yes, no) for patients who remained progression-free over successive 6-month intervals. While classic chemotherapy might perform comparably, or even better than immunotherapy during the first six months of treatment, immune therapies produce a plateau or flattening at the tail of the survival curves, conferring durable survival benefit in a subset of patients 23,24 . In fact, matching-adjusted indirect comparisons of dual immune checkpoint blockade with ipilimumab and nivolumab versus combined BRAF and MEK inhibition in patients with melanoma show that tentative benefits from immunotherapy emerge after the first 12 months of treatment 25 . In CheckMate 067, PFS curves begin to flatten after~24 months, irrespective of immunotherapy agent administered; unconfined by the effect of subsequent anticancer therapies, the flattening of the PFS curve at 24 months appears quite stable thereafter representing a useful surrogate for long-term immunotherapy outcomes (i.e., PFS at five years post treatment initiation, OS) 4,26-28 .
The fact that the predictive value of certain genes fluctuates based on the timepoint at which PFS is assessed is the most notable finding from the study. Our data indicate a potential positive switch for CD274 at 24 months post treatment initiation. Although PD-L1 positivity by IHC has limited predictive value for objective response classification in patients with melanoma, elevated levels of CD274 mRNA at baseline might explain, to some degree, durable responses to PD-1-based immunotherapies 29 .
Multigene signatures represent an effective approach to characterize the dynamics between the tumor and the TME. In fact, they can simultaneously outline and quantify multiple aspects of the latter that are linked with response to therapy. Using a lasso regression model and stepwise cross validation to minimize the risk of overfitting, we identified a set of genes that can accurately select patients who will experience stable tumor regression. Our final 12-gene LTB signature encompassed two major components of the tumor/TME interaction. The first component pertains to the hallmarks of cancer including relentless proliferation, evasion of growth suppression, resistance to apoptosis, and activation of invasion and metastasis. To this regard, increased baseline PIK3CA expression showed a highpositive association with LTB. Preclinical data support that PD-L1 expression levels are not influenced by oncogenic events in the phosphatidylinositol 3-kinase (PI3K) signaling pathway 30 . However, increased PI3K/AKT pathway activation, as documented in patients with melanoma harboring BRAF V600K mutations, correlated with high tumor mutational load and improved immunotherapy outcomes 31 . The second component appears to relate to the immune response towards the primary tumor. Current literature has linked objective response to immunotherapy with increased local IFNγ production 10 . Our findings clearly suggest a role for preexisting IFNγ signaling as well as enhanced antigen presentation in LTB from PD-1-based immunotherapies. As a matter of fact, IDO1 and PSMB10 that have been implicated in the prediction of objective response by Ayers et al. also take part in the prediction of LTB in our model; interestingly, both genes appear to gain importance over time.
Previous efforts from our group led to the generation of a mixed RNA and protein model (YMMM) for the prediction of best overall response (BOR) to immune checkpoint inhibition in patients with melanoma 32 . Notably, only NRDE2 contributed to the prediction of both BOR and LTB. Nuclear RNAi-defective 2 (NRDE2) is an evolutionarily conserved protein involved in RNA splicing; NRDE2 is responsible for suppressing intron retention with a tropism for pre-mRNAs with short, weak, GC-rich introns 33,34 . In addition, loss of NRDE2 results in severe genomic instability with accumulation of double-strand breaks 35 . In our models, NRDE2 demonstrated an inverse correlation with outcome. Hence NRDE2-dependent genomic events might lead to the aggregation of neoantigens rendering melanoma tumors susceptible to PD-1-based immunotherapies.
From a clinical standpoint, baseline computation of LTB, with further validation, could be a valuable complement to current companion diagnostic tests that focus on objective response classification. It incorporates follow-up information spanning a period of two years and enables simultaneous evaluation of immunotherapy outcomes at a second time point. Thus, assessment of LTB could increase confidence in the point estimate of objective response and, with a maximum sensitivity, identify false negative cases. Also, it may mark candidates for "functional cure", with implications to treatment discontinuation, and others in need of more intensive or alternative treatment approaches 36 . Finally, evaluation of LTB may provide useful insights into the evolving interactions between cancer cells and the host immune system that allow tumors evade recognition according to the immunoediting theory 37 .
This study is subject to several limitations. First, the study cohorts were collected retrospectively. Second, patients received either single agent or combination immunotherapy. The fact that the development of the 12-gene signature was based on previous selection of 770 genes contained in the nCounter PanCancer IO 360™ Panel represents an additional limitation of our study compared to an unbiased examination of the transcriptome. The prognostic value of the genes that comprise the 12-gene signature requires prospective evaluation in randomized trials (versus a "no treatment" arm), which would be challenging or impossible from an ethical perspective. Finally, this is a proof-ofconcept study and further analyses are pending to standardize the 12-gene LTB signature for different normalization methods used in RNA sequencing versus NanoString-obtained datasets and thus, ensure wide applicability across centers.
In conclusion, assessment of LTB allows for the evaluation of benefit at 24 months and provides deeper understanding of the biology relating to treatment with ICIs. Added to the assessment of objective response, it may evolve into a biomarker approach that is tailor-made to immune therapies, representing a paradigm shift in biomarker design.

Patient cohorts
The study cohorts are retrospective collections of melanoma patients with available tissue, treated with PD-1-based immunotherapies in the advanced setting from 2011 to 2017 (discovery cohort) and from 2017 to 2020 (validation cohort) at Yale Cancer Center (New Haven, CT). Patients with uveal melanoma were excluded. Pretreatment formalin-fixed, paraffin-embedded (FFPE) specimens from Yale Pathology archives were reviewed by a board-certified pathologist. Clinicopathological data were collected from clinical records and pathology reports; the data cutoff date was September 1, 2020. RECIST 1.1 were used to classify best overall response as complete response (CR), partial response (PR), stable disease (SD), or progressive disease (PD), and to determine the objective response rate (ORR) 16 . PFS was utilized to generate successive, 6-month clinical benefit intervals; short-term benefit (STB) was defined for patients who were alive and free of disease progression within 6 months from treatment initiation and longterm benefit (LTB) for those who were alive and free of disease progression within 24 months from treatment initiation. Patients whose follow-up was shorter than the prespecified intervals were excluded from the analysis. All patients provided written informed consent. The study was approved by the Yale Human Investigation Committee protocol #9505008219 and conducted in accordance with the Declaration of Helsinki.
Clinical data were also downloaded from the Genomic Data Commons (GDC) data portal from The Cancer Genome Atlas (TCGA) Research Network: https://cancergenome.nih.gov. Overall survival (OS) annotation was provided by TCGA for 470 patients with primary melanoma not treated with immunotherapy (TCGA-SKCM).
Finally, publicly available RNA sequencing information from pretreatment samples of patients with advanced melanoma treated with immune checkpoint inhibitors were downloaded to obtain external validation of our LTB gene signature 38 . Given that this dataset used TPM normalization for gene expression (rather than gene count normalization based on a set of core genes from the NanoString nCounter platform), and that HLA.DQA2 gene was absent, we reworked our model's weights to test the predictive value of the remaining 11 genes for LTB.
Quality assessment of FFPE tissue specimens Similar to previous studies, one cut section from each tissue block was routinely stained with hematoxylin and eosin and coverslipped for assessment of sample quality. Each hematoxylin and eosin-stained slide was reviewed by a board-certified pathologist to evaluate the adequacy of tumor cells, the quality of tissue preservation, and whether significant artifacts relating to fixation, processing, or prefixation tissue handling were present. Specimens containing no or minimal tumor tissue were excluded from the analysis 10 .

RNA isolation and gene expression analysis
Total RNA was isolated from 5 μm-thick pretreatment FFPE sections of tumors fixed on positively charged slides using the High Pure FFPET RNA Isolation Kit (Roche) following the manufacturer's protocols. RNA was quantified using the NanoDrop ND1000 spectrophotometer (Thermo Fisher Scientific). Gene expression analysis was conducted on the NanoString nCounter platform (NanoString Technologies). The nCounter PanCancer IO 360™ Panel containing 770 genes related to the tumor, its microenvironment and the antitumor immune response was used. Per sample, 250 ng of total RNA in a final volume of 5 μl was mixed with a 3′ biotinylated capture probe and a 5′ reporter probe and tagged with a fluorescent barcode from the custom gene expression code set. Probes and target transcripts were hybridized at 67°C for 16-24 h per the manufacturer's recommendations. Hybridized samples were run on the NanoString nCounter preparation station using the high-sensitivity protocol, in which excess capture and reporter probes were removed and transcriptspecific ternary complexes were immobilized on a streptavidincoated cartridge. The cartridge was scanned at maximum scan resolution on the nCounter Digital Analyzer 10 .

Normalization
Raw data counts were normalized using the geomean of 10 housekeeping genes included in the nCounter PanCancer IO 360™ Panel and each gene was adjusted based on the average of 2 panel standards across all data. Then, the housekeeper-and panel standard-normalized data were log2 transformed.

Statistical analysis
Our initial dataset consisted of 770 genes. To reduce the initial number of genes, we first implemented a correlation analysis between each gene's expression and clinical benefit at subsequent timepoints (6,12,18, and 24 months post treatment initiation) in our cohort. For each gene correlation, we collected the resulting non-adjusted p-value. Then, we selected those with a non-adjusted p-value of < 0.05 for each timepoint and pooled them together. This resulted in a final list of 58 genes, containing those associated with clinical benefit in at least one timepoint (Supplementary Table 1). Elastic net binomial regression models were inferred using (glmnet) package. Cross validation was performed using (cv.glmnet). For every model we optimized the lambda parameter (λ; nlamda = 1000) using an 8-fold cross validation and maximum AUC. To determine the best AUC across different alpha values (α from 0 to 1 with 0.1 increments) we used 100 replicates. Across all α values, we selected α = 0.8 for our elastic net models since all four models exhibited solid performance (AUC > 0.75), and the performance of the STB model (PFS at 6 months) was highest. To determine model coefficients for α = 0.8 (elastic net) we used 1000 replicates. Gene importance and variable importance plots were inferred using the variable importance plots (vip) package 39 . To determine whether there is a change in gene importance over time, we calculated each gene's importance for every optimized elastic net binomial regression model with α = 0.8, respectively. Then we fitted a linear curve to every gene's importance trajectory across the four clinical benefit intervals. High positive coefficients (slope) and correlation coefficients (Pearson r) indicated inclusion of a gene in the signature for clinical benefit over subsequent intervals. Heatmap plots were inferred using (gplots) and (RColorBrewer) packages 40 . For our final prediction Lasso models (at 6 and 24 months with independent training and validation) we also used 1000 replicates to obtain the best model. Specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and p-value were calculated using the confusion matrix from (caret) package, while ROC curves were drawn using the (ROCR) package 41,42 . Finally, we used the Youden's index to calculate the optimal cutoff and separate patients into high and low risk subgroups. Kaplan-Meier curves and log rank test were implemented using (survival) and (survminer) packages 43,44 . The entire analysis was performed using R 4.1.2.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The data that support the findings of this study have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE215868. The TCGA-SKCM can be accessed at https://cancergenome.nih.gov. The dataset by Gide et al. used for external validation of our findings can be accessed through the corresponding publication 38 .

CODE AVAILABILITY
For replicating stochasticity, we used "set.seed(14752)". For the cv.glmnet function our models were developed using type.measure = "auc", nfolds = 4, family = binomial, alpha = 1 for lasso regression, otherwise we used alpha Î [0,1) for elastic net or ridge regression and nlambda = 1000. Our best lambda value for our signature model was 0.02069792. The R code that was used for generating our signature model can be provided upon request to corresponding authors or through github (https:// github.com/Lsalichos/LBT_code). Our LTB model coefficients are provided in Supplementary Table 2.